The CHiME corpus: a resource and a challenge for computational hearing in multisource environments
نویسندگان
چکیده
We present a new corpus designed for noise-robust speech processing research, CHiME. Our goal was to produce material which is both natural (derived from reverberant domestic environments with many simultaneous and unpredictable sound sources) and controlled (providing an enumerated range of SNRs spanning 20 dB). The corpus includes around 40 hours of background recordings from a head and torso simulator positioned in a domestic setting, and a comprehensive set of binaural impulse responses collected in the same environment. These have been used to add target utterances from the Grid speech recognition corpus into the CHiME domestic setting. Data has been mixed in a manner that produces a controlled and yet natural range of SNRs over which speech separation, enhancement and recognition algorithms can be evaluated. The paper motivates the design of the corpus, and describes the collection and post-processing of the data. We also present a set of baseline recognition results.
منابع مشابه
Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments
This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Co...
متن کاملSpecial issue on speech separation and recognition in multisource environments
One of the chief difficulties of building distant-microphone speech recognition systems for use in `everyday' applications is that the noise background is typically `multisource'. A speech recognition system designed to operate in a family home, for example, must contend with competing noise from televisions and radios, children playing, vacuum cleaners, and outdoors noises from open windows. D...
متن کاملNoise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory
This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for contextsensitive Tandem feature extraction a...
متن کاملThe Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments
We present the Munich contribution to the PASCAL ‘CHiME’ Speech Separation and Recognition Challenge: Our approach combines source separation by supervised convolutive non-negative matrix factorisation (NMF) with our tandem recogniser that augments acoustic features by word predictions of a Long Short-Term Memory recurrent neural network in a multi-stream Hidden Markov Model. The performance of...
متن کاملA New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کامل